Maximal Data Piling in Discrimination
نویسندگان
چکیده
In a binary discrimination problem, a linear classifier finds a linear hyperplane that separates two classes by partitioning the data space. Especially in a High Dimension Low Sample Size (HDLSS) setting, there are linear separating hyperplanes such that the projections of the training data points onto their normal direction vectors are identically zero, or some non-zero constant. Of interest in this paper is a linear separating hyperplane such that the projections of the training data points from each class onto its normal direction vector have two distinct values, one for each class. This direction vector is uniquely defined in the subspace generated by the data. A simple formula is given to find this direction. In non-HDLSS settings, this direction vector is the same as the Fisher Linear Discrimination direction vector.
منابع مشابه
Distance Weighted Discrimination
High Dimension Low Sample Size statistical analysis is becoming increasingly important in a wide range of applied contexts. In such situations, it is seen that the appealing discrimination method called the Support Vector Machine can be improved. The revealing concept is data piling at the margin. This leads naturally to the development of Distance Weighted Discrimination, which also is bas...
متن کاملDistance Weighted Discrimination
High Dimension Low Sample Size statistical analysis is becoming increasingly important in a wide range of applied contexts. In such situations, it is seen that the popular Support Vector Machine suffers from “data piling” at the margin, which can diminish generalizability. This leads naturally to the development of Distance Weighted Discrimination, which is based on Second Order Cone Programmin...
متن کاملClass-sensitive Principal Components Analysis
DI MIAO: CLASS-SENSITIVE PRINCIPAL COMPONENTS ANALYSIS (Under the direction of J. S. Marron and Jason P. Fine) Research in a number of fields requires the analysis of complex datasets. Principal Components Analysis (PCA) is a popular exploratory method. However it is driven entirely by variation in the dataset without using any predefined class label information. Linear classifiers make up a fa...
متن کاملSparse Distance Weighted Discrimination
Distance weighted discrimination (DWD) was originally proposed to handle the data piling issue in the support vector machine. In this paper, we consider the sparse penalized DWD for high-dimensional classification. The state-of-the-art algorithm for solving the standard DWD is based on second-order cone programming, however such an algorithm does not work well for the sparse penalized DWD with ...
متن کاملGeometric Insights into Support Vector Machine Behavior using the KKT Conditions
The Support Vector Machine (SVM) is a powerful and widely used classification algorithm. Its performance is well known to be impacted by a tuning parameter which is frequently selected by cross-validation. This paper uses the Karush-Kuhn-Tucker conditions to provide rigorous mathematical proof for new insights into the behavior of SVM in the large and small tuning parameter regimes. These insig...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004